Suppose \(Y_t\) is the observation. The Model says: \[ Y_t \hspace3mm = \hspace3mm m_t + S_t + X_t \] \[ \begin{align} m_t &: \mbox{ Trend Component }\\ S_t &: \mbox{ Seasonal Component }\\ X_t &: \mbox{ Stationary Time Series } \end{align} \] where \(EX_t=0\).
Condition on Seasonal component: \(S_{t+s}=S_t\) and \(\sum_{j=1}^s S_j = 0\).
Accidental deaths in USA., 1973 to 1978 from Brockwell
acf1 <- acf; library(TSA); acf <- acf1 #- Load TSA package
D <- read.csv("http://gozips.uakron.edu/~nmimoto/pages/datasets/acci.csv", header=T)
Acci <- ts(D, start=c(1973,1), freq=12) #- Turn D into ts object with frequency
plot(Acci, type='o', ylab="num of accidents")## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161
## 1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710
## 1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8433 8160
## 1976 7717 7461 7776 7925 8634 8945 10078 9179 8037 8488 7874
## 1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265
## 1978 7836 6892 7791 8129 9115 9434 10484 9827 9110 9070 8633
## Dec
## 1973 8927
## 1974 8680
## 1975 8034
## 1976 8647
## 1977 8796
## 1978 9240
#--- plot with month ---
plot(Acci, type="l", ylab="num of accidents")
points(y=Acci, x=time(Acci), pch=as.vector(season(Acci))) Oil Filter Sales Data (Cryer p7) inside TSA package.
data(oilfilters) # load data inside TSA package
Oil <- oilfilters
is.ts(Oil) # is data in TS format?## [1] TRUE
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1983 2385 3302 3958 3302 2441 3107
## 1984 5862 4536 4625 4492 4486 4005 3744 2546 1954 2285 1778 3222
## 1985 5472 5310 1965 3791 3622 3726 3370 2535 1572 2146 2249 1721
## 1986 5357 5811 2436 4608 2871 3349 2909 2324 1603 2148 2245 1586
## 1987 5332 5787 2886 5475 3843 2537
## [1] 12
#--- plot with month ---
plot(Oil, type="l",ylab="Sales")
points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))\(s\) is the seasonality frequency. (e.g. \(s=12\) for monthly seasonality).
Seasonality repeats every \(s\) observation. \[ S_{t+s}=S_t. \]
Seasonality is not a trend. It’s a temporary deviation from overall trend.
\[
\sum_{j=1}^s S_j = 0.
\]
There are three poplular ways of removing seasonality:
MA filter
Seasonal Average
Seaonal Differencing
If \(s\) is odd, let it be \(2q+1\). Then use linear MA filter. \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{(2q+1)} \sum_{i=-q}^q X_{t-j} \] If \(s\) is even, let it be \(2q\). Then use \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{2q} \Big(.5 X_{t-q} + \sum_{i=-q-1}^{q-1} X_{t-j} + .5 X_{t+q}\Big) \]
Because seasonality should sum up to zero for each seasonal cycle, we have estimated trend: \[ \hat m_t \hspace3mm = \hspace3mm \frac{1}{(2q+1)} \sum_{i=-q}^q m_{t-j} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q S_{t-j} }_{0} + \underbrace{ \frac{1}{(2q+1)} \sum_{i=-q}^q Y_{t-j} }_{\mbox{ small }}. \] Then estimate for (\(S_t\) + \(Y_t\)) is \[ w_k \hspace3mm = \hspace3mm X_{t} - \hat m_{t} \hspace10mm q < t < n-q. \]
Use this to estimate the seasonal part, \[ \hat S_k \hspace3mm = \hspace3mm w_k - \frac{1}{s} \sum_{i=1}^s w_i. \] Now we have deseasonalized data (\(m_t\) + \(Y_t\)) \[ d_t \hspace3mm = \hspace3mm X_t - \hat S_t \hspace10mm t=1 \cdots n \] Now go back and re-estimate trend using \(d_t\).
t <- 1:100
Y <- -3 +.07*t + arima.sim(n = 100, list(ma = c(.4, .2) ) )
m <- filter(Y, rep(1/9, 9))
res <- Y-m
plot(Y, type="o")
lines(m, lwd=2, col="red") #-- Apply MA filter to Acci data
m.hat <- filter(Acci, c(.5, rep(1,10), .5)/12)
plot(Acci, type="o")
lines(m.hat, col="red")Filter reveals quadtaric trend
#--- Take Monthly Averages
Mav1 <- aggregate(c(Acci), list(month=cycle(Acci)), mean)$x #- 1yr long Mtly Av
M.av1 <- ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci)) #- Mtly Av
Ds.Acci <- Acci-M.av1 #- Subtract from original
plot(M.av1, type="o") # Monthly Average## Warning in kpss.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.01 0.885 0.397
Stationary?
## Series: Ds.Acci
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.4056
## s.e. 0.1082
##
## sigma^2 estimated as 66457: log likelihood=-494.53
## AIC=993.07 AICc=993.24 BIC=997.59
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.755 0.53 0.351 0.557 0.632 0.768 255.614
## Series: Ds.Acci
## ARIMA(2,0,0) with zero mean
##
## Coefficients:
## ar1 ar2
## 0.6039 0.2916
## s.e. 0.1134 0.1156
##
## sigma^2 estimated as 67065: log likelihood=-501.97
## AIC=1009.94 AICc=1010.29 BIC=1016.77
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.438 0.192 0.13 0.789 0.697 0.863 257.122
After we subtracted the seasonality (Monthly average \(S_t\)), We either have ARIMA(0,1,1) or ARIMA(2,0,0).
\[
\begin{align}
Y_t &= \hspace3mm S_t + X_t \\\\
X_t &= \hspace3mm \phi_1 X_{t-1} + \phi_2 X_{t-2} + e_t
\hspace10mm \mbox{(if ARIMA(2,0,0))}
\end{align}
\]
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## Warning in kpss.test(A): p-value greater than printed p-value
## KPSS ADF PP
## p-val: 0.1 0.01 0.01
## Series: diff(diff(Acci, 12))
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean
##
## Coefficients:
## ma1 sma1 mean
## -0.4963 -0.6146 21.0220
## s.e. 0.1351 0.1932 11.9921
##
## sigma^2 estimated as 98283: log likelihood=-424.27
## AIC=856.53 AICc=857.27 BIC=864.84
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.588 0.457 0.453 0.385 0.497 0.355 307.622
We took \(\bigtriangledown_{12}\), then \(\bigtriangledown\), then fit MA(1). Therefore our model is \[ \bigtriangledown \bigtriangledown_{12} Y_t \hspace3mm = \hspace3mm X_t \\ X_t \hspace3mm = \hspace3mm e_t + \theta_1 e_{t-1} + \theta_{12} e_{t-12} \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace3mm \mbox{ model } \\ \\ \mbox{ sARIMA}(0,1,1) (0,1,0)_{12} \hspace3mm \mbox{ model } \]
#--- Take Monthly Averages
D0 <- Oil
Mav1 <- aggregate(c(D0), list(month=cycle(D0)), mean)$x #- 1yr long Mtly Av
M.av <- ts(Mav1[cycle(D0)], start=start(D0), freq=frequency(D0))
Ds.Oil <- D0-M.av #- Subtract from original
plot(M.av, type="o")## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.015 0.039 0.01
## Series: Ds.Oil
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.8626
## s.e. 0.0696
##
## sigma^2 estimated as 378095: log likelihood=-368.67
## AIC=741.35 AICc=741.62 BIC=745.05
## Series: Ds.Oil
## ARIMA(0,0,0) with zero mean
##
## sigma^2 estimated as 379437: log likelihood=-376.42
## AIC=754.85 AICc=754.93 BIC=756.72
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.518 0.525 0.241 0.818 0.804 0.173 622.503
After we subtracted the seasonality (Monthly average \(S_t\)), it was WN. \[ \begin{align} Y_t &= \hspace3mm S_t + X_t \\\\ X_t &= \hspace3mm e_t \end{align} \]
## Warning in adf.test(A): p-value smaller than printed p-value
## Warning in pp.test(A): p-value smaller than printed p-value
## KPSS ADF PP
## p-val: 0.075 0.01 0.01
## Series: diff(Oil, 12)
## ARIMA(0,0,0) with non-zero mean
##
## Coefficients:
## mean
## -217.3889
## s.e. 132.7486
##
## sigma^2 estimated as 652522: log likelihood=-291.57
## AIC=587.14 AICc=587.5 BIC=590.31
## B-L test H0: the sereis is uncorrelated
## M-L test H0: the square of the sereis is uncorrelated
## J-B test H0: the sereis came from Normal distribution
## SD : Standard Deviation of the series
## BL15 BL20 BL25 ML15 ML20 JB SD
## [1,] 0.71 0.645 0.502 0.967 0.993 0.055 807.789
We took \(\bigtriangledown_{12}\), then it looks like a WN. Therefore our model is \[ \bigtriangledown_{12} X_t \hspace3mm = \hspace3mm X_t \\ \\ X_t \hspace3mm = \hspace3mm e_t \] This is called \[ \mbox{ sARIMA}(p,d,q) (P,D,Q)_{12} \hspace3mm \mbox{ model } \\ \\ \mbox{ sARIMA}(0,0,0) (0,1,0)_{12} \hspace3mm \mbox{ model } \]
R code used in this lecture
# 1. Trend with Seasonality
### __Ex: Accident Data
acf1 <- acf; library(TSA); acf <- acf1 #- Load TSA package
D <- read.csv("http://gozips.uakron.edu/~nmimoto/pages/datasets/acci.csv", header=T)
Acci <- ts(D, start=c(1973,1), freq=12) #- Turn D into ts object with frequency
plot(Acci, type='o', ylab="num of accidents")
Acci
#--- plot with month ---
plot(Acci, type="l", ylab="num of accidents")
points(y=Acci, x=time(Acci), pch=as.vector(season(Acci)))
### __Ex: Oil Filter Sales Data
data(oilfilters) # load data inside TSA package
Oil <- oilfilters
is.ts(Oil) # is data in TS format?
Oil # look inside
frequency(Oil) # what is the frequency?
#--- plot with month ---
plot(Oil, type="l",ylab="Sales")
points(y=Oil, x=time(Oil), pch=as.vector(season(Oil)))
# 2. Removing Seasonality
### __Ex: Accidents Data
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
plot(Acci, type="o")
#--- Take Monthly Averages
Mav1 <- aggregate(c(Acci), list(month=cycle(Acci)), mean)$x #- 1yr long Mtly Av
M.av1 <- ts(Mav1[cycle(Acci)], start=start(Acci), freq=frequency(Acci)) #- Mtly Av
Ds.Acci <- Acci-M.av1 #- Subtract from original
plot(M.av1, type="o")
plot(Ds.Acci, type="o")
plot(Acci, type="o")
plot(M.av1, type="o")
plot(Ds.Acci, type="o"); abline(h=0)
Stationarity.tests(Ds.Acci)
### __Ex: Oil Filter Sales
plot(Oil, type="o")
#--- Take Monthly Averages
Mav2 <- aggregate(c(Oil), list(month=cycle(Oil)), mean)$x #- 1yr long Mtly Av $
M.av2 <- ts(Mav2[cycle(Oil)], start=start(Oil), freq=frequency(Oil)) #- Mtly Av as long as D2
Ds.Oil <- Oil-M.av2 #- Subtract from original
plot(M.av2, type="o")
plot(Ds.Oil, type="o")
plot(Oil, type="o")
plot(M.av2, type="o")
plot(Ds.Oil, type="o"); abline(h=0)
Stationarity.tests(Ds.Oil)
library(forecast)
source("http://gozips.uakron.edu/~nmimoto/477/TS_R-90.txt")
auto.arima(Ds.Acci)
auto.arima(Ds.Oil)
Fit1 <- auto.arima(Ds.Acci, d=0)
Fit1
Randomness.tests(Fit1$resid)
Fit2 <- auto.arima(Ds.Oil, d=0)
Fit2
Randomness.tests(Fit2$resid)
### __Ex: Accidents Data
plot(Acci, type='o', ylab="num of accidents")
plot(diff(Acci, 12)) # take seasonal difference (Del_12)
plot(diff(diff(Acci, 12))) # take (Del)(Del_12)
Stationarity.tests(diff(diff(Acci, 12)))
auto.arima(diff(diff(Acci, 12)))
Fit3 <- Arima(diff(diff(Acci, 12)), order=c(0,0,1) )
Randomness.tests(Fit3$resid)
## __Ex: Oil Filter Sales Data
plot(diff(Oil, 12)) # take seasonal difference (Del_12)
Stationarity.tests(diff(Oil, 12))
Fit4 <- auto.arima(diff(Oil, 12))
Fit4
Randomness.tests(Fit4$resid)